home *** CD-ROM | disk | FTP | other *** search
- CHAPTER 4 ELEMENTS OF THE A86 LANGUAGE
-
- This chapter begins the description of the A86 language. It's a
- bit more tutorial in nature than the rest of the manual. I'll
- start by describing the elementary building blocks of the
- language.
-
-
- General Categories of A86 Elements
-
- The statements in an A86 source file can be classified in three
- general categories: instruction statements, data allocation
- statements, and assembler directives. An instruction statement
- uses an easily remembered name (a mnemonic) and possibly one or
- more operands to specify a machine instruction to be generated.
- A data allocation statement reserves, and optionally initializes,
- memory space for program data. An assembler directive is a
- statement that gives special instructions to the assembler.
- Directives are unlike the instruction and data allocation
- statements in that they do not specify the actual contents of
- memory. Examples of the three types of A86 statements are given
- below. These are provided to give you a general idea of what the
- different kinds of statements look like.
-
- Instruction Statements
-
- MOV AX,BX
- CALL SORT_PROCEDURE
- ADD AL,7
-
- Data Allocation Statements
-
- A_VARIABLE DW 0
- DB 'HELLO'
-
- Assembler Directives
-
- CODE SEGMENT
- ITEM_COUNT EQU 5
-
- The statements in an A86 source file are made up of reserved
- symbols, user symbols, numbers, strings, special characters, and
- comments.
-
- Symbols are the "words" of the A86 language. All symbols are a
- collection of consecutive letters, numbers, and assorted special
- characters: _, @, $, and ?. Symbols cannot begin with digits:
- anything that begins with a digit is a number. Symbols can begin
- with any of the special characters just listed. Symbols can also
- begin with a period, which is the only place within the symbol
- name a period can appear.
- 4-2
-
- Reserved symbols have a built-in meaning to the assembler:
- instruction mnemonics (MOV, CALL), directive names (DB, STRUC),
- register names, expression operators, etc. User symbols have
- meanings defined by the programmer: program locations, variable
- names, equated constants, etc. The user symbol name is
- considered unique up to 127 characters, but it can be of any
- length (up to 255 characters). Examples of user symbols are:
- COUNT, L1, and A_BYTE.
-
- Numbers in A86 may be expressed as decimal, hexadecimal, octal,
- binary, or decimal "K". These must begin with a decimal digit
- and, except in the case of a decimal or hexadecimal number, must
- end with "x" followed by a letter identifying the base of the
- number. A number without an identifying base is hexadecimal if
- the first digit is 0; decimal if the first digit is 1 through 9.
- Examples of A86 numbers are: 123 (decimal), 0ABC (hexadecimal),
- 1776xQ (octal), 10100110xB (binary), and 32K (decimal 32 times
- 1024).
-
- Strings are characters enclosed in either single or double
- quotes. Examples of strings are: '1st string' and "SIGN-ON
- MESSAGE, V1.0". If you wish to include a quote mark within a
- string, you can double it; for example, 'that''s nice' specifies
- a single quote mark within the string. The single quote and
- double quote are two of many special characters used in the
- assembly language. Others, run together in a list, are: ! $ ? ;
- : = , [ ] . + - ( ) * / >. The space and tab characters are also
- special characters, used as separators in the assembly language.
-
- A comment is a sequence of characters used for program
- documentation only; it is ignored by the assembler. Comments
- begin with a semicolon (;) and run to the end of the line on
- which they are started. Examples of lines with comments are
- shown below:
-
- ; This entire line is a comment.
- MOV AX,BX ; This is a comment next to an instruction statement.
-
- Alternatively, for compatibility with other assemblers, I provide
- the COMMENT directive. The next non-blank character after
- COMMENT is a delimiter to a comment that can run across many
- lines; all text is ignored, until a second instance of the
- delimiter is seen. For example,
-
- COMMENT 'This comment
- runs across two lines'
-
- I don't like COMMENT, because I think it's very dangerous. If,
- for example, you have two COMMENTs in your program, and you
- forget to close the first one, the assembler will happily ignore
- all source code between the comments. If that source code does
- not happen to contain any labels referenced elsewhere, the error
- may not be detected until your program blows up. For multiline
- comments, I urge you to simply start each line with a semicolon.
- 4-3
-
- Statements in the A86 are line oriented, which means that
- statements may not be broken across line boundaries. A86 source
- lines may be entered in a free form fashion; that is, without
- regard to the column orientation of the symbols and special
- characters.
-
- PLEASE NOTE: Because an A86 line is free formatted, there is no
- need for you to put the operands to your instructions in a
- separate column. You organize things into columns when you want
- to visually scan down the column; and you practically never scan
- operands separate from their opcodes. Realizing this, you may
- wish to separate your operands from the mnemonic with a space
- instead of a tab, making the line less disjointed and hence
- easier to read. You will also have room for a longer comment
- after the instruction.
-
-
- Operand Typing and Code Generation
-
- A86 is a strongly typed assembly language. What this means is
- that operands to instructions (registers, variables, labels,
- constants) have a type attribute associated with them which tells
- the assembler something about them. For example, the operand 4
- has type "number", which tells the assembler that it is a
- numerical constant, rather than a register or an address in the
- code or data. The following discussion explains the types
- associated with instruction operands and how this type
- information is used to generate particular machine opcodes from
- general purpose instruction mnemonics.
-
- Registers
-
- The 8086 has 8 general purpose word (two-byte) registers:
- AX,BX,CX,DX,SI,DI,BP, and SP. The first four of those registers
- are subdivided into 8 general purpose one-byte registers
- AH,AL,BH,BL,CH,CL,DH, and DL. There are also 4 16-bit segment
- registers CS,DS,ES, and SS, used for addressing memory; and the
- implicit instruction-pointer register (referred to as IP,
- although "IP" is not part of the A86 assembly language).
-
- My A386 assembler supports the two additional segment registers
- FS and GS, plus the 32-bit general registers
- EAX,EBX,ECX,EDX,ESI,EDI,EBP, and ESP. The lower 16 bits of each
- 32-bit register is the corresponding 16-bit register (without the
- E in its name).
-
- Variables
-
- A variable is a unit of program data with a symbolic name,
- residing at a specific location in 8086 memory. A variable is
- given a type at the time it is defined, which indicates the
- number of bytes associated with its symbol. Variables defined
- with a DB statement are given type BYTE (one byte), and those
- defined with the DW statement are given type WORD (two bytes).
- Examples:
- 4-4
-
- BYTE_VAR DB 0 ; A byte variable.
- WORD_VAR DW 0 ; A word variable.
-
- Labels
-
- A label is a symbol referring to a location in the program code.
- It is defined as an identifier, followed by a colon (:), used to
- represent the location of a particular instruction or data
- structure. Such a label may be on a line by itself or it may
- immediately precede an instruction statement (on the same line).
- In the following example, LABEL_1 and LABEL_2 are both labels for
- the MOV AL,BL instruction.
-
- LABEL_1:
- LABEL_2: MOV AL,BL
-
- In the A86 assembly language, labels have a type identical to
- that of constants. Thus, the instruction MOV BX,LABEL_2 is
- accepted, and the code to move the immediate constant address of
- LABEL2 into BX, is generated.
-
- IMPORTANT: you must understand the distinction between a label
- and a variable, because you may generate a different instruction
- than you intended if you confuse them. For example, if you
- declare XXX: DW ?, the colon following the XXX means that XXX is
- a label; the instruction MOV SI,XXX moves the immediate constant
- address of XXX into the SI register. On the other hand, if you
- declare XXX DW ?, with no colon, then XXX is a word variable; the
- same instruction MOV SI,XXX now does something different: it
- loads the run-time value of the memory word XXX into the SI
- register. You can override the definition of a symbol in any
- usage with the immediate-value operator OFFSET or the
- memory-variable opertors B,W,D,Q, or T. Thus, MOV SI,OFFSET XXX
- loads the immediate value pointing to XXX no matter how XXX was
- declared; MOV SI,XXX W loads the word-variable at XXX no matter
- how XXX was declared.
-
- Constants
-
- A constant is a numerical value computed from an assembly-time
- expression. For example, 123 and 3 + 2 - 1 both represent
- constants. A constant differs from a variable in that it
- specifies a pure number, known by the assembler before the
- program is run, rather than a number fetched from memory when the
- program is running.
- 4-5
-
- Generating Opcodes from General Purpose Mnemonics
-
- My A86 assembly language is modeled after Intel's ASM86 language,
- which uses general purpose mnemonics to represent classes of
- machine instructions rather than having a different mnemonic for
- each opcode. For example, the MOV mnemonic is used for all of
- the following: move byte register to byte register, load word
- register from memory, load byte register with constant, move word
- register to memory, move immediate value to word register, move
- immediate value to memory, etc. This feature saves you from
- having to distinguish "move" from "load," "move constant" from
- "move memory," "move byte" from "move word," etc.
-
- Because the same general purpose mnemonic can apply to several
- different machine opcodes, A86 uses the type information
- associated with an instruction's operands in determining the
- particular opcode to produce. The type information associated
- with instruction operands is also used to discover programmer
- errors, such as attempting to move a word register to a byte
- register.
-
- The examples that follow illustrate the use of operand types in
- generating machine opcodes and discovering programmer errors. In
- each of the examples, the MOV instruction produces a different
- 8086 opcode, or an error. The symbols used in the examples are
- assumed to be defined as follows: BVAR is a byte variable, WVAR
- is a word variable, and LAB is a label. As you examine these MOV
- instructions, notice that, in each case, the operand on the right
- is considered to be the source and the operand on the left is the
- destination. This is a general rule that applies to all
- two-operand instruction statements.
-
- MOV AX,BX ; (8B) Move word register to word register.
- MOV AX,BL ; ERROR: Type conflict (word,byte).
- MOV CX,5 ; (B9) Move constant to word register.
- MOV BVAR,AL ; (A0) Move AL register to byte in memory.
- MOV AL,WVAR ; ERROR: Type conflict (byte,word).
- MOV LAB,5 ; ERROR: Can't use label/constant as dest. to MOV.
- MOV WVAR,SI ; (89) Move word register to word in memory.
- MOV BL,1024 ; ERROR: Constant is too large to fit in a byte.
-
-